Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Co-training algorithm with combination of active learning and density peak clustering
GONG Yanlu, LYU Jia
Journal of Computer Applications    2019, 39 (8): 2297-2301.   DOI: 10.11772/j.issn.1001-9081.2019010075
Abstract511)      PDF (770KB)(247)       Save
High ambiguity samples are easy to be mislabeled by the co-training algorithm, which would decrease the classifier accuracy, and the useful information hidden in unlabeled data which were added in each iteration is not enough. To solve these problems, a co-training algorithm combined with active learning and density peak clustering was proposed. Before each iteration, the unlabeled samples with high ambiguity were selected and added to the labeled sample set after active labeling, then density peak clustering was used to cluster the unlabeled samples to obtain the density and relative distance of each unlabeled sample. During iteration, the unlabeled samples with higher density and further relative distance were selected to be trained by Naive Bayes (NB) classification algorithm. The processes were iteratively done until the termination condition was satisfied. Mislabeled data recognition problem could be improved by labeling samples with high ambiguity based on active learning algorithm, and the samples reflecting data space structure well could be selected by density peak clustering algorithm. Experimental results on 8 datasets of UCI and the pima dataset of Kaggle show that compared with SSLNBCA (Semi-Supervised Learning combining NB Co-training with Active learning) algorithm, the accuracy of the proposed algorithm is up to 6.67 percentage points, with an average improvement of 1.46 percentage points.
Reference | Related Articles | Metrics